An abstract of less than 150 words - Discuss what the paper talks about with a little introduction.
The Organization for Economic Cooperation and Development OECD is a global organization that aims to create better policies for better lives. Its mission is to create policies that promote prosperity, equality, opportunity, and well-being for all. PISA is one of OECD’s Programme for International Student Assessment. PISA assesses 15-year-old students’ potential to apply their knowledge and abilities in reading, mathematics, and science to real-world challenges. OECD launched this in 1997, it was initially administered in 2000, and it currently includes over 80 nations. The PISA study, conducted every three years, provides comparative statistics on 15-year-olds’ performance in reading, maths, and science. This paper describes how to utilize the learningtower package, which offers OECD PISA datasets from 2000 to 2018 in an easy-to-use format. This dataset comprises information on their test results and other socioeconomic factors, as well as information on their schools, infrastructure and the countries participating in the program.
PISA assesses the extent to which children approaching the end of compulsory school have learned some of the information and abilities required for full participation in modern society, notably in maths, reading, and science. The examination focuses on reading, mathematics, science, and problem solving. It also assesses students capacity to replicate information and extrapolate from what they have learned and apply that knowledge in unexpected circumstances, both inside and outside of school. This approach reflects the fact that individuals are rewarded in modern economies not for what they know, but for what they can accomplish with what they know.
This evaluation which is carried out every three years, assists in identifying students’ development of knowledge and skills throughout the world, which can provide actionable insights and therefore assist education policymakers. PISA is well known for its distinctive testing characteristics, which include policy orientation, an innovative notion of literacy, relevance to lifelong learning, regularity, and breadth of coverage. PISA is now used as an assessment tool in many regions around the world. In addition to OECD member countries, the survey has been or is being conducted in East, South and Southeast Asia, Central, Mediterranean and Eastern Europe, and Central Asia, The Middle East, Central and South America and Africa.
For each year of the PISA study, one domain subject is thoroughly examined. In 2018, for example, reading was assessed alongside mathematics and science as minor areas of assessment. The 2012 survey concentrates on mathematics, with reading, science, and problem solving serving as minor evaluation topics. PISA targets a certain age group of students in order to properly compare their performance worldwide. PISA students are aged between 15 years 3 months and 16 years 2 months at the time of the assessment, and have completed at least 6 years of formal schooling. They can enroll in any sort of institution, participate in full-time or part-time education, academic or vocational programs, and attend public, private, or international schools inside the country. Using this age across nations and throughout time allows PISA to compare the knowledge and abilities of people born in the same year who are still in school at the age of 15, irrespective of their diverse schooling.
The PISA test is primarily computer-based and lasts around 2 hours. The examination comprises both multiple choice and free entry questions. Some countries that were not ready for computer-based delivery carried out the testing on paper. Each student may have a unique set of questions. An example of the test may be seen here. PISA assessment areas seek to measure the following aspects of students’ literacy in math, reading, and science. The goal of mathematical literacy is to assess students ability to grasp and interpret mathematics in a variety of settings. Reading literacy assesses students’ capacity to absorb, apply, analyze, and reflect on texts in order to attain required goals and participate in society. Science literacy is described as the ability to engage with science-related issues and scientific concepts as a reflective citizen.
PISA data is publicly accessible for download. Furthermore, reading the data documentation reveals that the disclosed PISA scores are generated using a sophisticated linear model applied to the data. For each student, several values are simulated. This is known as synthetic data, and it is a popular technique to ensuring data privacy. The data can still be deemed accurate within the mean, variance, and stratum used in the original data’s modelling. In addition, the PISA website provides the data in SPSS and SAS format, which can limit accessibility due to the commercial nature of these software. Furthermore, all questions are assigned with unique IDs within each year of the PISA study, but do not always agree across the different years. This data has now been curated and simplified into a single R package called learningtower, which contains all of the PISA scores from the years 2000 to 2018.
Each developer at the ROpenSci OzUnconf was assigned to curate a specific year of the PISA study. Data on the participating students and schools were first downloaded from the PISA website, in either SPSS or SAS format. The data were read into an R environment with the exception of the year 2000 and 2003. Due to formatting issues, the data for these two particular years were first read using SPSS and then exported into compatible .sav files. After some data cleaning and wrangling with the appropriate script, the variables of interest were re-categorised and saved as RDS files. One major challenge faced by the developers was to ensure the consistency of variables over the years. For example, a student’s mother’s highest level of education was never recorded in 2000, but it was categorised as “ST11R01” between 2003 and 2012 and “ST005Q01TA” between 2015 and 2018. Such a problem was tackled manually by curating these values as an integer variable named “mother_educ” in the output data. These final RDS file for each PISA year were then thoroughly vetted and made available in a separate GitHub repository.
learningtower?‘learningtower’ is an easy-to-use R package that provides quick access to a variety of variables using OECD PISA data collected over a three-year period from 2000 to 2018. This dataset includes information on the PISA test scores in mathematics, reading, and science. Furthermore, these datasets include information on other socioeconomic aspects, as well as information on their school and its facilities, as well as the nations participating in the program.
The motivation for developing the learningtower package was sparked by the announcement of the PISA 2018 results, which caused a collective wringing of hands in the Australian press, with headlines such as “Vital Signs: Australia’s slipping student scores will lead to greater income inequality” and “In China, Nicholas studied maths 20 hours a week. In Australia, it’s three”. That’s when several academics from Australia, New Zealand, and Indonesia decided to make things easier by providing easy access to PISA scores as part of the ROpenSci OzUnconf, which was held in Sydney from December 11 to 13, 2019. The data from this survey, as well as all other surveys performed since the initial collection in 2000, is freely accessible to the public. However, downloading and curating data across multiple years of the PISA study could be a time consuming task. As a result, we have made a more convenient subset of the data freely available in a new R package called learningtower, along with sample code for analysis.
The learningtower package primarily comprised of three datasets: student, school, and countrycode. The student dataset includes results from triennial testing of 15-year-old students throughout the world. This dataset also includes information about their parents’ education, family wealth, gender, and presence of computers, internet, vehicles, books, rooms, desks, and other comparable factors. Due to the size limitation on CRAN packages, only a subset of the student data can be made available in the downloaded package. These subsets of the student data, known as the student_subset_yyyy (yyyy being the specific year of the study) allow uses to quickly load, visualise the trends in the full data. The full student dataset can be downloaded using the load_student() function included in this package. The school dataset includes school weight as well as other information such as school funding distribution, whether the school is private or public, enrollment of boys and girls, school size, and similar other characteristics of interest of different schools these 15-year-olds attend around the world. The countrycode dataset includes a mapping of a country/region’s ISO code to its full name.
learningtower developers are committed to providing R users with data to analyse PISA results every three years. Our package’s future enhancements include updating the package every time additional PISA scores are announced. Note that, in order to account for post COVID-19 problems, OECD member nations and associates decided to postpone the PISA 2021 evaluation to 2022 and the PISA 2024 assessment to 2025.
In this section we will illustrate how the learningtower package can be utilized to answer some research questions by applying various methodologies and statistical computations on the learningtower datasets.
We will solely utilize the 2018 PISA data and scores for illustrative purposes throughout the example analysis section. During the post-development phase, the learningtower developers collectively decided to answer a few intriguing questions on the PISA data and see if we could identify any interesting trends or insights utilizing this dataset. Some of these questions include if there is any significant gender difference between girls and boys whos perform is better in any of the three areas of mathematics, reading, and science. Do the various socioeconomic characteristics reflected in the student data have a substantial impact on the scores of these 15-year-olds. Furthermore, we will delve into Australia’s score history and temporal trend to uncover some noteworthy trends that Australia has observed as a result of its participation in the PISA experiment.
Gender gaps have always been a topic of interest among researchers, and when it comes to PISA data and scores of 15-year-old students around the world, uncovering patterns based on their gender would help gain meaningful insights in the field of education for various education policymakers around the world. Based on the 2018 PISA results, let us see if there is a major gender disparity between girls and boys throughout the world in mathematics, reading, and science. To begin, we will create a ‘data.frame’ that stores the weighted average maths score for each nation as well as the various regions of the countries organized by country gender. Survey weights are critical and must be used in the analysis to guarantee that each sampled student accurately represents the total number of pupils in the PISA population. In addition, we compute the gender difference between the two averages. To demonstrate the variability in the mean estimate, we use bootstrap sampling with replacement on the data and compute the same mean difference estimate. For each nation, the empirical 90 percent confidence intervals are presented. The same process is used for reading and science test scores.
Figure 1: The chart above depicts the gender gap difference in 15-year-olds’ in math, reading, and science results in 2018. The scores to the right of the red line represent the performances of the girls, while the scores to the left of the red line represent the performances of the boys. One of the most intriguing conclusions we can get from this chart is that in the PISA experiment in 2018, girls from all nations outperformed boys in reading.
Figure 1 illustrates the global disparities in mean math, reading, and science outcomes Before we get to the plot conclusion, let’s have a look at the variables that have been plotted. The red line here indicates a reference point, and all of the scores to the right of the red line show the scores of girls in math, reading, and science. Similarly, the scores on the left side of this line indicate the scores of boys in the three disciplines. Based on figure 1, because most math estimates and confidence intervals lie to the left of the red line, we may conclude that most boys outperformed girls in math. In nations such as Morocco, Netherlands, Slovenia, Poland, Bulgaria, and Greece, there is almost no gender difference in math outcomes. When we look at the reading scores, we notice a really interesting detail: girls outpaced boys in reading in all countries in 2018. The highest reading scores were achieved by girls from Qatar, the United Arab Emirates, and Finland. Looking further into the science plot, we see an unexpected pattern: most nations have very little gender difference in science scores, implying that most boys and girls perform equally well in science. Boys from Peru, Colombia, and regions of China perform really well in science and girls from Qatar, the United Arab Emirates, and Jordan are the top scores for science. Figure 1 helps us depicts the gender gap in math, reading, and science for all nations and regions that took part in the 2018 PISA experiment.
We gathered meaningful insights about the gender gap between girls and boys throughout the world from the above figure 1 because this is a geographical research communication topic, the findings will help us better comprehend the score differences in the three educational disciplines using globe maps. Let us continue to investigate and discover patterns and correlations using this, map visualisation. To illustrate the gender gap difference between girls and boys throughout the world, we utilize the map_data function to get the latitude and longitude coordinates needed to construct a map for our data. We connect these latitude and longitude coordinates to our PISA data and render the world map using geom_polygon function wrapped within ggplot2.
Figure 2: Interactive maps that show the gender gap in math, reading, and science results between girls and boys throughout the world. The interactive aspect of the map allows one to move their cursor around the global map, which displays the country name as well as the gender gap scores between girls and boys. A positive score for a country indicates that girls outperformed boys in that country, whereas a negative score for a country difference indicates that boys outperformed girls in that country. The diverging colour scale makes it possible to interpret the range of scores and the also helps us intrepret the gender gap difference among these students across the globe. The reading scores all have positive values, indicating that girls outperform boys across the world in the year 2018.
Figure 2: Interactive maps that show the gender gap in math, reading, and science results between girls and boys throughout the world. The interactive aspect of the map allows one to move their cursor around the global map, which displays the country name as well as the gender gap scores between girls and boys. A positive score for a country indicates that girls outperformed boys in that country, whereas a negative score for a country difference indicates that boys outperformed girls in that country. The diverging colour scale makes it possible to interpret the range of scores and the also helps us intrepret the gender gap difference among these students across the globe. The reading scores all have positive values, indicating that girls outperform boys across the world in the year 2018.